Spoken Text Difficulty Estimation Using Linguistic Features

نویسندگان

  • Su-Youn Yoon
  • Yeonsuk Cho
  • Diane Napolitano
چکیده

We present an automated method for estimating the difficulty of spoken texts for use in generating items that assess non-native learners’ listening proficiency. We collected information on the perceived difficulty of listening to various English monologue speech samples using a Likert-scale questionnaire distributed to 15 non-native English learners. We averaged the overall rating provided by three nonnative learners at different proficiency levels into an overall score of listenability. We then trained a multiple linear regression model with the listenability score as the dependent variable and features from both natural language and speech processing as the independent variables. Our method demonstrated a correlation of 0.76 with the listenability score, comparable to the agreement between the nonnative learners’ ratings and the listenability score.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Organization of Gatekeeping and Mental Framework in the System of Representation and Hierarchical Relational Structures of the Modern Society

Critical discourse analysis as a type of social practice reveals how linguistic choices enable speakers to manipulate the realizations of agency and power in the representation of action.The present study examines the relationship between language and ideology and explores how such a relationship is represented in the analysis of spoken text and to show how declarative knowledge, beliefs, attit...

متن کامل

Linguistic Features for Quality Estimation

This paper describes a study on the contribution of linguistically-informed features to the task of quality estimation for machine translation at sentence level. A standard regression algorithm is used to build models using a combination of linguistic and non-linguistic features extracted from the input text and its machine translation. Experiments with EnglishSpanish translations show that lin...

متن کامل

Latin Etymologies as Features on BNC Text Categorization

This paper presents an early experimental work on BNC Text Categorization (TC) with Latin etymologies as features, emphasis on spoken and written texts. Two aims achieved in this study: (1) to explore discriminative new linguistic features rather than lots of noise-bringing “bag-of-words” (BoW). (2) to build up a base step to represent texts in distinct types of linguistic features with differe...

متن کامل

Linguistic Features of English Textese and Digitalk of Iranian EFL Students

This study aimed at investigating the English textese of Iranian EFL learners by scrutinizing the linguistic features through a qualitative design. In doing so, 700 messages were collected from 43 MA Iranian EFL learners of both genders. The features were categorized and analyzed calculating the frequency and percentage. The findings of the study showed that Iranian EFL students used different ...

متن کامل

Predicting Emotion in Spoken Dialogue from Multiple Knowledge Sources

We examine the utility of multiple types of turn-level and contextual linguistic features for automatically predicting student emotions in human-human spoken tutoring dialogues. We first annotate student turns in our corpus for negative, neutral and positive emotions. We then automatically extract features representing acoustic-prosodic and other linguistic information from the speech signal an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016